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Abstract —Power systems are developing very fast nowadays, 
both in size and in complexity; this situation is a challenge 
for Early Event Detection (EED). This paper proposes a data- 
driven unsupervised learning method to handle this challenge. 
Specifically, the random matrix theories (RMTs) are introduced 
as the statistical foundations for random matrix models (RMMs); 
based on the RMMs, linear eigenvalue statistics (LESs) are 
defined via the test functions as the system indicators. By 
comparing the values of the LES between the experimental 
and the theoretical ones, the anomaly detection is conducted. 
Furthermore, we develop 3D power-map to visualize the LES; 
it provides a robust auxiliary decision-making mechanism to the 
operators. In this sense, the proposed method conducts EED 
with a pure statistical procedure, requiring no knowledge of 
system topologies, unit operation/control models, etc. The LES, 
as a key ingredient during this procedure, is a high dimensional 
indictor derived directly from raw data. As an unsupervised 
learning indicator, the LES is much more sensitive than the low 
dimensional indictors obtained from supervised learning. With 
the statistical procedure, the proposed method is universal and 
fast; moreover, it is robust against traditional EED challenges 
(such as error accumulations, spurious correlations, and even 
bad data in core area). Case studies, with both simulated data 
and real ones, validate the proposed method. To manage large- 
scale distributed systems, data fusion is mentioned as another 
data processing ingredient. 

Index Terms —big data, early event detection, data-driven, 
unsupervised learning, smart grid, random matrix model, linear 
eigenvalue statistics, 3D power-map, data fusion 

I. Introduction 

D ATA have become a strategic resource for power systems. 

Data are readily accessible caused by developments of 
various technologies and devices, such as Information Com¬ 
munication Technology (ICT), Advanced Metering Infrastruc¬ 
ture (AMI), Phasor Measurement Units (PMUs), Intelligent 
Electronic Devices (IEDs), Supervisory Control and Data 
Acquisition (SCADA) (T|. As a result, data with features of 
volume, velocity, variety, and veracity (i.e. 4Vs data) O, as 
well as curses of dimensionality (3), are inevitably generated 
and daily aggregated in power systems. 

Early event detection (EED), by continuously monitoring 
and processing 4Vs data, detects and identifies emerging 
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patterns of anomalies in power systems to make corresponding 
emergency responses [4 ]. In other words, EED aims to tell 
signals from noises—we treat receivable sample errors and 
irregular fluctuations (from distributed generators and small 
loads) as noises; whereas we treat system faults, network 
reconfigurations, and dramatic changes from loads/generators 
(often unintended) as signals. A smart grid, especially the one 
large in scale, is a complex big data system essentially 0113. 
For such a system, it is a big challenge to conduct EED within 
a tolerable elapsed time and hardware resources. 

The integration of big data analytics and unsupervised 
learning mechanism is an effective approach to this challenge. 
For the former , big data analytics is a scientific trend dealing 
with complex data EE). It is a data-driven tool and aims to 
work out statistical characteristics (especially correlations) in¬ 
dicated by linear eigenvalue statistics (LESs) i). That means, 
it conducts data processing in high dimensions, rather than 
builds and analyzes individual models based on assumptions 
and simplifications, to help understand and gain insight into 
the systems ma. Big data analytics has already been suc¬ 
cessfully applied in numerous phenomena, such as quantum 
systems DD, financial systems fl2) . biological systems ed, 
wireless communication networks dHO; we believe 
that it will also have a wide applied scope in power systems 
ECEHE). For the latter , the supervised learning methods 
are prevailing in data processing. The key parts are the in¬ 
ferred functions and empirical models; these functions/models, 
produced via a subjective training procedure (often in low 
dimensions), lead to a determinate parameter as the system 
indicator ED. However, for a complex system, it is hard to 
find a convincing training way to ensure the validity of the last 
indicator; besides, it is impossible to train all the scenarios to 
infer an event identification framework which is robust enough 
to manage all the 4Vs data. In contrast, the unsupervised 
method proposed in this paper utilizes the data in the form 
of random matrix models (RMMs), which are derived from 
the raw data in a statistical manner. Hence, the unsupervised 
methods is more suitable for EED in smart grids. 

A. Contribution 

This paper proposes a novel data-driven unsupervised learn¬ 
ing method to conduct EED in smart grids, and a comparison 
is made with supervised ones (e.g. a dimensionality reduction 
method based on Principal Component Analysis (PCA) IE))- 
First, random matrix theories (RMTs) are briefly introduced 
as the solid mathematic foundations for RMMs. Built on 
RMMs, two major data processing ingredients—LES designs 
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and data fusion—are systematically studied. 1) LESs are high 
dimensional statistical indicators; with different test functions, 
LESs gain insight into the systems from different perspec¬ 
tives. Moreover, some theoretical values related to LESs are 
predictable as the reference points via the latest theorems. 
Additionally, 3D power-map is developed to visualize the LES 
for the decision-making auxiliary functions. This visualization 
is sensitive to events—it is able to detect serious system faults, 
as well as some small fluctuations; moreover, the visualization 
is robust against bad data—even with data loss in the core area, 
we can still achieve the proper judgements. 2) Besides, data 
fusion, by putting together diverse data sources, provides us 
a comprehensive view towards systems. It is a deep research; 
we just give a brief mention. In general, based on RMTs and 
the big data applying architecture 0, this paper presents a 
series of work associated with EED: the theorems of LESs, 
a briefly discussion about data fusion, the indicators derived 
from LESs, and the visualization of the results. Case studies, 
with both simulated data and real ones, validate the proposed 
method, and related theories and theorems. 


B. Related Work 

1) Our Previous Work: Paper (6J is the first one; it is 
also the first attempt to apply big data analytics systemati¬ 
cally to power systems. It provided a feasible architecture as 
the approach with two independent procedures—engineering 
procedure and mathematical one. Random matrix theories 
(RMTs), such as Ring Law and M-P Law, were elaborated 
as the solid foundations; Mean spectral radius (MSR) was 
proposed to indicate the statistical correlations (now we know 
that MSR is a specific LES). Specifically, moving split- 
window (MSW) technology was introduced to deal with data 
in temporal dimensions; in spatial dimensions, block calcu¬ 
lation was presented. Simulated data in various fields were 
studied as cases. Then we moved forward to the second stage. 
Paper (201, based on the first one (specially, Ring Law, M-P 
Law, the architecture, MSR, MSW, the simulated 118-buses 
system), talked about the correlation analysis approaches. An 
augmented matrix was studied as the key ingredient. Actually, 
augmented matrix is a form of data fusion—it consists of 
two data sources with totally different sizes and meanings: 
the status one for the basic part, and the factor one for the 
augmented part. As the same stage work, paper [21 j, using the 
70 nodes network testbed, tested some data fusion ingredients 
based on RMTs: the product of non-hermitian random matri¬ 
ces, the geometric mean, the arithmetic mean, and the product 
of random Ginibre matrices. With the experimental data, the 
effectiveness of these data fusion ingredients, is validated for 
signal detection in Massive MIMO systems. 

2) Others’ Work: Data are a core resource and data man¬ 
agement is the keypoint for EED in smart grids. There have 
been numerous discussions about utilizing PMUs to improve 
wide-area monitoring, protection and control (WAMPAC) l22l 
[241. Xu initiated power disturbance data analytics, and showed 
a wide scope in the future (T|. The mathematical foundations, 
system frameworks, and applying architectures are missing 
yet. Recently, Xie proposed an early event detection algorithm 


based on principal component analysis (PCA) m. However, 
it is a supervised learning method; the bad subspace (i.e., linear 
combinations of the labelled data from several empirical cho¬ 
sen PMUs named as pilot PMUs), caused by improper training 
procedure, will lead to bad results. Although many researches 
(especially those methods based on specific physical models) 
were done in various fields, little attention has been paid 
to the data-driven unsupervised learning methods, which are 
based on solid mathematical foundations and with universal 
statistical procedures. Additionally, related mathematical work 
is introduced in Sec II. 


II. Random Matrix Theories 
The nomenclature is given as Table [I] 

TABLE I: Some Lrequently Used Notations in the Theories 


Notations 

Meanings 

X,x,x ; x^j 

a matrix, a vector, a single value, an entry of a matrix 

X,x,£ 

hat: raw data 

X,x,x,Z 

tilde: intermediate variables, formed by normalization 

N, T, c 

the numbers of rows and columns; c = N/T 

c NxT 

N xT dimensional complex space 

X u 

the singular value equivalent of X 

s 

covariance matrix of X: S = ^XX^ G C NxN 

M 

another form of covariance matrix of X: M = cS 

Z ,L 

L independent matrices product: Z = YliLi^u,i 

As, A^, Am 

the eigenvalue of matrix S, Z, M 

As,z 

the i- th eigenvalue of matrix S 

r 

the circle radius on the complex plane of eigenvalues 

T 

linear eigenvalue statistics 

r MSR 

mean value of radius for all eigenvalues of Z: u ( r \^) 

<P,<P 

the test function and its Fourier transformation 

X 

random variable 

E(X),D(X) 

expectation, variance for X 

M(x),<7 2 (x) 

mean, variance for x 

X° 

X-E(X) 

Ki 

z-th cumulant of a random variable X 

K mtel 

m)-c(e 2 ) 


A. From Physical System to Random Matrix 

Lor a system, we assume t times observation for n- 
dimensional vectors^,x 2 ,- • • ,x t (x^ G C nxl , j = 1,- • •, t), and 
a data source, denoted as Ft (in size of n x t), is obtained. Ft is 
in a high-dimensional space but not an infinite one (Or more 
explicitly, we are interested in the practical regime in which 
n=100-10000, and t is sufficient large); this disables most 
classical tools. In contrast, RMTs enable us to select arbitrary 
data—both in temporal dimensions (e.g. T from t) and in 
spatial dimensions (e.g. N from n )—to form X G C NxT 
naturally; X is a random matrix due to the presence of 
ubiquitous noises. Lurthermore, we can convert X into a 
normalized matrix X row-by-row (see (18)); thus, the random 
matrix model (RMM) is built to map the system. 

In our previous work (6l [2Qll2T1l . we have already elaborated 
RMTs, Ring Law, M-P Law, and MSR; here we just give 
some key result^] Whereas, the data processing ingredients, 


Although the asymptotic convergence in RMTs is considered under infinite 
dimensions, the asymptotic results are remarkably accurate for relatively 
moderate matrix sizes such as tens. This is the very reason why RMTs can 
handle practical massive systems. 
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including linear eigenvalue statistic (LES) designs and data 
fusion, are the major parts; they are elaborated in Sec III. 


B. Random Matrix Theories (RMTs) 

RMTs are effective to power systems, which has already 
been validated (20l . First, we assume that: for the rectangular 
N xT random matrix X, the entries are independent identi¬ 
cally distributed (i.i.d.) variables, satisfying the conditions: 

E(X.,)=0, E(X iJ X m J = 5 i , m 6 j , n a 2 (1) 


where o is the variance, and 5 a ,p is the Kronecker Delta Function: 



V/3 


1) Ring Law: 

Consider a L independent matrices product Z = nf =1 Xu, t , 
where X„, G C NxN is the singular value equivalent l25l of 


X (see (19)); X is obtained directly from raw data X (see 
(T8]Q. Furthermore, the matrices product Z is converted into 
Z (see (20)). Thus, the empirical spectrum density (ESD) of 
Z converges almost surely to the same limit given by 


Pring (^) 




otherwise 


( 2 ) 


as AT, T -A oo with the ratio c = N/T G (0,1]. 

2) Marchenko-Pastur Law (M-P Law): 

M-P Eaw describes the asymptotic behavior of the the 
covariance matrix: 


S=iXX ff e C NxN (3) 


where X is the rectangular NxT non-Hermitian random matrix 
satisfying condition 0- 

Then, the ESD of S converges to the distribution of M-P 
Law (2611271 with density function: 


„ m _ f - A )( A “ a -) 

mp 1 0 , otherwise 

(4) 

where a±=cr 2 ( 1 ± \Ic) 2 . 

3) Mean Spectral Radius (MSR): 

MSR is a high-dimensional indicator. For a specific matrix 
Z (as described in Ring Law part above), we can calculate the 
eigenvalues Az on the complex plane. The mean value of all 
these eigenvalues’ radii length is denoted as r MSR : 


=EiLi£|A*,i| 


(5) 


2) Law of Large Numbers: 

The law of Large Numbers is the first step in studies of 
eigenvalue distributions for a certain random matrix ensemble. 
The result, for the Wigner ensemble, obtained initially in f30l . 
was improved in ED, where the Stieltjes transformation was 
introduced and the famous semicircle law was shown under the 
minimal conditions on the distribution of W (the Lindeberg 
type conditions) (29). The law of Large Numbers tells us that 
N-WnM converges in probability to the limit 

lim ^A/'j V [<p]= [ <p(\)p(\) dX (7) 

TV—>• OO J 


where p( A) is the probability density function (PDF) of the eigen¬ 
values. In particular, for the Gaussian orthogonal ensemble 
(GOE) [28] (see (21),(22), and (23) in the appendix), p( A) 
is according to the semicircle law: 


Psc{^) 


2iGV4w 2 - A 2 A 2 < 4w 2 
0 A 2 >4 co 2 


( 8 ) 


The covariance matrix ensemble is another classical type ; 
M-P Law, as describe in Sec II, is adapted to this ensemble. 
The covariance matrix is widely used in engineering due to 
the rectangular form—we can also study the RMM X G C NxT 
with N=fT. We will further discuss this ensemble below. 

3) Central Limit Theorems (CLTs): 

Central Limit Theorems (CLTs), as the natural second step, 
aim to study the LES fluctuations. Lots of papers devote 
to proofs of CLTs for different random matrix ensembles 
(see (28l [29l [321436) ). CLTs for LESs with polynomial test 
functions of some generalizations for the Wigner and covari¬ 
ances matrices were proved in (35) via moment methods. In 
contrast, CLTs for LES with real analytic test functions of 
the Wigner and covariances matrices were established in El 
under additional assumptions that 

E(W i /) = 2,E{W i /) = 3E 2 {W i /) = 3 Wigner 
E(2Qj 4 ) =3E 2 (Xij 2 ) Covariance 

In the recent paper (28) . CLTs for LESs of the Wigner 
and covariances matrices were proved under assumptions that 
E(Wi^ 2 ) = 2, the third and the forth moments of all entries are 
the same, but E (Wij 4 ) is not necessary 3. Moreover, the test 
functions are not supposed to be real analytic. It was assumed 
that the Fourier transformation (p satisfies the inequality 

J(1 + \k\ 5 )\fi(k)\dk < oc 

which means that ip has more than 5 bounded derivatives. 


III. Data Processing Ingredients 


B. CLT for Covariance Matrices 


A. Linear Eigenvalue Statistics and Related Research 


1) Definition: 

The linear eigenvalue statistic (LES) of a matrix X G C NxN 
is defined via the continuous test function ip : M -A C [28 , © I 

•Vvb] = E^IiAAi) (6) 


Parallel to ([3]), we study another typical covariance matrix: 

rH _ lc /- rr*NxN 


M=AxX ff = ^SeC 


its PDF is also according to M-P Law: 

1 2¥A^V / ( a + _ A)(A —a_) ,a^X^b 

1 0 , otherwise 


Prrrp2 (A) 


(9) 


( 10 ) 
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where a± —cr 2 (l d= 1 /\fc) 2 , and a is the variance. 

The CLT for M is given as follows [29]: 

Theorem III.l (M. Sheherbina, 2009). Consider a rectan¬ 
gular NxT non-Hermitian random matrix X, with entries 
X id satisfying the condition ([71); M is the covariance matrix 
(see |9]) ). Let the real valued Jest function ip satisfy condition 
IMI 3 / 2 -H: < °° {s > 0 ). Then A /jv° [<£>]> in the limit N,T 
00 , c = 7V/T < 1, converges in the distribution to the Gaussian 
random variable with zero mean and the variance: 


Vsc M = JJ ^ 2 (^ 1 ^ 2 ) (1 - sinsin 62) d0id6 2 


-f <01,02<f 


+ ^| [ / <^(C (0)) sin 000^ 

(11) 

where ^ {61,62) — ~ 02 , and £ (0) — 1 + l/c+2/v / csin0; 

[C( 0)] 0 = 0 2 

= E (X 4 ) — 3 ij 4-^/z cumulant of entries o/X. 


(/- 


C. Designs and Theoretical Values 

For Gauss variable X, E(X) = 0, E(X 2 ) = 1, and E(X 4 ) = 3 
(see ([24])). A typical scenario is assumed: TV = 118 and 
T = 240, thus c = N/T = 0.4917; 

1) LES for Ring Law: 

MSR (see Sec II) is a special LES 0 it is defined as follows: 

N 1 

tmsr = 5Tjvl A z,J ( U ) 

i= 1 


where A^ i (z« 1,2 r .., N) are the eigenvalues of Z, and |A^ J is 
the radius of A^ i on the complex plane. 

According to ([ 7 ]), the theoretical expectation of r when N —>> 
00 (E(t msr )), are calculated as follows: 


E(t msr ) = E(r)= // Area P(r)xr-rdrd6» 

= C f 1 /- j— —r-rdr d# = 0.8645 

40 JVl _c C7T 

where P(r) is given in formula ([5]), and c= 0.4917 for this scenario. 
Also, we can calculate the theoretical variances (B(t msr )): 

E(r 2 ) = // Area E(^) xr 2 -r drd0 = 0.7542 
D(t msr )= D(r)= E(r 2 ) - E(r) 2 = 0.0068 (J4) 

2) /6>r Covariance Matrices: 

1 . Chebyshev Polynomials p(A) = T 2 = 2x 2 — 1 

N 

t T2 = 2V 2 - 1) (15) 

i=l 

For the scenario, according to 0 and ( [To] ), we get E(tt 2 ): 

E ( r r 2 ) = XS)pmp2(S) dx = 6600 (16) 

and according to we get B(tt 2 ): 

B(t T2 ) = 1080 (17) 


2 Since A^ i are highly correlated random variables (each one is a com¬ 
plicated function of the random matrices (i = 1 , 2 ,..., L)), tmsr is a 
random variable. 


Similarly, we can design other test functions to obtain 
diverse LESs as the indicators, and some theoretical values; 
here we list some classical test functions: 

2. Chebyshev Polynomials: X 3 = 4x 3 — 3x 

3. Chebyshev Polynomials: T 4 = 8 x 4 — 8x 2 + 1 

4. Determinant: DET = \n(x) 

5. Likelihood-ratio function: LRF = x — ln(x) — 1 

In Sec V, we will show their applications in smart grids. 

D. Universality Principle 

Akin to CLTs, universality ED refers to the phenomenon 
that the asymptotic distributions of various covariance matrices 
(such of eigenvalues and eigenvectors) are identical to those 
of Gaussian covariance matrices. These results let us calculate 
the exact asymptotic distributions of various test statistics 
without restrictive distributional assumptions of matrix entries. 
The presence of the universality property suggests that high¬ 
dimensional phenomenon is robust to the precise details of the 
model ingredients 1571 . For example, one can perform various 
hypothesis tests under the assumption that the matrix entries 
not Gaussian distributed but use the same test statistic as in 
the Gaussian case. 

The data of real systems can be viewed as a spatial and 
temporal sampling of the random graph. Randomness is intro¬ 
duced by the uncertainty of spatial locations and the system 
uncertainty. Under real-life applications, we cannot expect 
the matrix entries follow i.i.d. distribution. Numerous studies 
based on both simulations m and experiments, however, 
demonstrate that the Ring Law and M-P Law are universally 
followed. In such cases, universality properties provide a 
crucial tool to reduce the proofs of general results to those 
in a tractable special case—the i.i.d. case in our paper. 

E. Data Fusion 


Data fusion (including the augmentation, the blocking, the 
sum, and the product of matrices) is another data processing 
ingredient. Comparing to the LES designs, which aim to define 
the LES r via the test functions ip{A) for a determinate 
X, data fusion manages to handle multiple data sources 
(i.e. Xi,X 2 ,'-'X even with diverse features (e.g. in totally 
different size). The theories about data fusion are deep and 
novel: Gotze, Kosters, and et al, in [38-401, have already 
studied the performance of the matrices in the form of : 

p(!) _L_I pO) 

3 - n ' ' ■ L n 


where F W and xi°’ 0) , • • • , ■ ■ ■ , xi m ’ fc) 

are independent nxn matrices with independent entries. 

We will not go so far and just show some typical ap¬ 
plications; data fusion is very common and meaningful in 
engineering. Our previous work [20] conducted data fusion as 
follows: the status data source (a matrix B in size of N xT), 
and the factor data source (a vector c in size of lxT), in a 
certain manner, are put together to form the augmented matrix 
A (A = [B;C]); this A, as a new random matrix model, is used 
to conduct correlation analysis. Besides, Zhang [211, using the 
data from a 70 nodes network testbed, validated the data fusion 
in the field of signal detection for Massive MIMO systems. 
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IV. Unsupervised Learning Method 

This section makes a comparison between the unsupervised 
learning methods (e.g. Ring Law Analysis) and the supervised 
ones (e.g. PC A) from diverse perspectives. 

A. Procedure of Ring Law Analysis 

The Ring Law Analysis conducts EED with following steps: 


Steps of Ring Law Analysis 

1) Select arbitrary raw data (or all available data) as the data source ST 

2) Forming random matrix model X at a certain time U; 

3) Obtain Z by variable transformations (X —> X —X n — Z —Z); 

4) Calculate eigenvalues A^ and plot the Ring on the complex plane; 

5) Conduct high-dimensional analysis; 

4a) Observe the experimental ring and compare it with the reference one; 
4b) Calculate r MSR as the experimental value; 

4c) Compare r MSR with the theoretical value TVE(r); 

6) Repeat 2)-5) at the next time point (L=ti + 1); 

7) Visualize r MSR on the time series; 

8) Make engineering explanations. 


Steps 2)-7) conduct high dimensional analysis only us¬ 
ing raw data, and then visualize the indicator r; they are 
unsupervised statistical proceedings without assumptions and 
simplifications. In step 2), for different purposes, arbitrary raw 
data, even ones from distributed nodes or intermittent time 
series, are able to be focused on to form the RMM X. It 
is also an online data-driven method requiring no knowledge 
of the physical models/topologies. In addition, the size of 
X is controllable during step 2); it relieves the curse of 
dimensionality in some ways. 

B. Procedure of Principal Component Analysis (PCA) 

PC A is a prevailing data-driven method l24l . Xie proposed a 
dimensionality reduction method based on PCA as a approach 
to EED in smart grids OH; the steps are listed as follows: 


Steps of PCA _ 

1) Select data Y = [y 1 ,y 2 ,--- ,y N ) & C nxN = ■ ■ ,t/ nji ] T ; 

2) C Y =' Y h Y e C NyN , calculate A(C Y ) : 

3) Rearrange and select the top m eigenvalues: Ai—^p l5 • • • , A m —^p m ; 

4) Form m dimensional principal component subspace L(p 1 ,p^,- • • ,p m ) 

and project the original N variables onto it: Select m' ^ m vector- 
based variables as the pilot PMUs from N PMUs to form the linear 
basis matrix Y s = [y 6l ,y te ,-• • ,-y^,] e C nXm . The selected m' 
variables should be as orthogonal to each other as possible, which means 
min (cos 6) = (y hi -y bj )/{\y bi \\y bi \) = i^j) 

5) Represent non-pilot PMUs y ci (i = 1, 2,- • N — m') for training : Let 
v cz = [vi,ciW2,cir • • Wm' ,ci ] T t> e the vector of regression coefficients: 

Yci = Yw) • ( V l,ciW2,ciC ■ ■ ,Vm',ci))=V B -Vci 

"-V-' 

Basis 

v C i = (Yb h Yb ) -1 Yb 11 y c j; 

6) Train at t s =s — 1 : 

7) Judge at t = s : 

||y c /,Y s s v ci (- * 1 2 3 4 5 6 7 )||. 


Step 6) and step 7) are the executive parts based on data 
processing procedure— steps l)-5). The key steps are step 4) 
and step 5); they constitute the training procedure. By choosing 
m' PMUs (as pilot PMUs) from total N PMUs, the procedure 
tags the system in a reduced subspace (N—tm!) ; in this way, 
the function v c ^ is inferred. 

C. Comparison of Supervised and Unsupervised Methods 

The supervised learning methods are prevailing among 
massive systems. The key idea for these methods is to tag 
the systems—by labelled parameters, principal eigenvectors, 
inferred functions, etc; the detailed steps, by dealing with 
prior models and data in a low dimensional space, make up an 
empirical training procedure. In a word, supervised learning 
methods are of ’labelling’ steps based on assumptions and 
simplifications. For supervised learning methods, there are 
some problems, essentially, hard to be solve: 

a) . The error accumulations, spurious correlations, incidental corre¬ 
lations are unavoidable as systems grow large; there does not exist a 
solid mechanism to eliminate or to relieve them. 

b) . The training procedure is not strict. Taking paper (Hi for an 
example, it chooses the pilot PMUs in an unconvinced way: the 
most unrelated ones, rather than the best suitable ones, are chosen. 
Some additional devices, especially the empirical ones, are essential 
as mentioned in the paper—e.g., the PMUs of some topologically 
and physically significant buses should be pilot ones, while the PMUs 
which are historically eventful should not. Additionally, the improper 
setting of the pilot PMUs total number m\ or the pre-specified 
variance threshold c, will make the result worse. 

c) . The bad data in the core area (e.g. the incomplete, the inaccurate, 
and the unavailable data), or the worst situation—loss of all the data 
of the event area, will almost disable the methods. 

d) . Moreover, it is impossible to train over all the events or scenarios. 
There must be something unexpected in the large scale systems, even 
of which we can not give a proper description in low dimensions. Just 
a simple intuitive example: we can easily deduce —y ci —— Y B v C i 
from y ci = Y B v C i- Whereas, it is hard to make a verdict that the 
simultaneous reverse of the pilot PMUs data Y B and the non-pilot 
PMUs data y ci is an anomaly or not. 

In general, traditional model-based methods or data-driven 
supervised learning methods are in low dimensions; they 
highly depend on only a few parameters related to physical 
models, subjective hypotheses, empirical causality, training 
procedure, etc. 

On the other hand, data-driven unsupervised ones are 
usually based on high-dimensional statistical models which 
are built on solid mathematical foundations. In other words, 
the related theories, algorithms, and statistics (e.g. RMTs 
and LESs) are always based on probability and designed in 
a multi-dimensional space. Thus, the unsupervised methods 
tend to analyze the interrelations and interactions (seen as 
correlations) directly using the raw data without any label; 
they are fast, sensitive, and universal. Moreover, they are pure 
statistical data processing in high-dimensions which will not 
bring in systematical errors. It comes to the conclusion that 
the unsupervised learning methods, with advantages described 
as above, perform better than the supervised ones. 
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V. Case Studies 

In this section, we use both simulated data and real ones 
to validate the proposed approach. For the simulated case, we 
adopt the standard IEEE 118-bus system f4H (shown as Figure 
[Hi- Detailed information about the simulation is referred to 
the easel 18.m in Matpower package and Matpower 4.1 Users 
Manual l42l . There are generally two scenarios in the system: 
1) only white noises, e.g., small random fluctuations of loads 
and Gaussian sample errors; 2) Signals plus noises, it means 
that there are also sudden changes, or even serious faults. 
For the real case, we use a 48-hours database of some power 
grid in China. Above all, the EED may be modeled as binary 
hypothesis testing: normal hypothesis Ho (no signal present) 
and abnormal hypothesis Hi (signal present). 


A. Simulated 118-bus System 


According to From Physical System to Random Matrix in 
Sec II, the data source fiy = Vij E M 1 18x1500 (n = 118, 
£ = 1500) is obtained to map the simulated system. 

1) Conduct EED Based on Ring Law, M-P Law, MSR: 

For details about this work, see our previous work 0. Here, 
we assume the events as Table [111} Figure [T] shows the results. 

At sampling time £ s = 600 s, the RMM X includes a time 
period Axime = 361:600 s; the noises play a dominant part 
during the period. Figure [la| shows that the distribution of Az 
is more closely to the reference ring (L = 1); and Figure [lc] 
shows that the distribution of A M (in blue bars) fits the M- 
P distribution (in blue line) quite well. Whereas, at sampling 
time £ s = 601 s, Figure [Tbl shows that the eigenvalue points 


collapse to the center point of the circle, which means that 
the correlations of the data have been enhanced somehow; and 
Figure [Td| also shows the deviation between the experimental 
distribution and the theoretical one. 


From 


£ curve in Figure le it is observed that 


starts to decrease (0.8665,0.6308, •• • ,0.4927) at £ = 600 s, 
just when the event (,sudden change of Pb U s- 52 ) occurs as the 
signal. The influence lasts for full time length (T = 240 s) and 
the decreasing lasts for half (120 s); thus, a ”U”-shaped curve 
is observed. In this way, we conduct anomaly detection; the 
time for the beginning point of ”U” (£ = 600 s for this case) 
is right the anomaly start time. 

2) LES Designs: 

Designing LESs is a major target in this paper; here, we 
study diverse LESs with different test functions. Keeping 
T = 240 s, we can divide the temporal space into 5 subspaces 
(stages) according to the status of Pbus -52 as follows: 



Time Areas (time length) 

Descripiton 

SI 

241 s- 

600 s 

(360 s) 

fluctuations around 0 M 

S2 

601 s- 

840 s 

(240 s) 

a step signal 

S3 

841 s- 

1200 s 

(360 s) 

fluctuations around 300 M 

S4 

1201 s- 

1306 s 

(106 s) 

a ramp signal 

S5 

1307 s- 

1500 s 

(194 s) 

static voltage collapse 


The results of the LESs, designed according to LES Designs 
and the Theoretical Values in Sec III, are shown as Table [nj 




(a) Ring Law at t s = 6 00 s (b) Ring Law at t s = 601 s 



Tlength=240 N=118 ||Ver=28 



(e) MSR on Time Series 


Fig. 1: Ring Law, M-P Law and MSR for Simulated Data V 


TABLE II: LESs and their Values 


1 

1 MSR 

t 2 

Ts 

t 4 

DET 

LRF 

EO: Theoretical Value 

E(r) 

0.8645 

1.34E3 

1.01E4 

8.35E4 

48.3 

73.68 

D(t) 

0.0068 

6.65E2 

9.35E4 

1.30E7 

1.32 

1.42 

Cv 

0.0954 

0.0193 

0.0304 

0.0432 

0.0238 

0.0162 


SI: Only small fluctuations around 0 MW 



0.8648 

1.33E3 

9.93E3 

8.19E4 

73.68 

73.3 

a 2 (r) 

0.0080 

6.53E1 

2.20E4 

4.67E6 

0.406 

0.322 


S2: A step signal (Pbus-52- 0 MW — >• 300 MW) is included 


a 2 (r) 

0.5149 

0.0788 

1.29E4 

3.30E6 

1.92E6 

1.51E11 

3.04E8 

5.66E15 

-174 

890 

295 

893 

S3: Only small fluctuations around 300 MW 

a 2 ( T ) 

0.8141 

0.0250 

1.54E3 

1.81E2 

1.73E4 

3.30E5 

2.89E5 

4.20E8 

27.9 

0.507 

93.1 

0.419 

S4: A Ramp signal as the system incoming 

m(t) 

a 2 (r) 

0.6448 

0.0571 

6.43E3 

7.20E6 

6.40E5 

1.68E11 

7.54E7 

3.29E15 

-61.4 

1.74E3 

182 

1.73E3 

S5: Static voltage collapse for the system 

a 2 (r) 

0.4136 

0.1076 

7.49E3 

7.47E6 

6.48E5 

2.99E11 

7.25E7 

1.02E15 

-598 

4.16E4 

719 

4.16E4 


*c v = ^/E(t)/E(t) is the coefficient of variation 
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The above results validate that it is feasible to analyse the 
power system using the random matrix model (RMM); the 
upper layer should be ontology and we do not go that far. 
Here, we just make some engineering statements/analyses: 

1. Independent of the system models/topologies, we can 
design the LES r. For a RMM X with determinate size 
N x T, if the test function p(A) is given, the LES r is 
obtained; some related theoretical values (the expectation 
E(t), the variance B(r), and the coefficient of variation c v ) 
are able to be calculated as well. 

2. Among p( A) given above, LRF performs best in the view 
of c v (Low c v means high precision and repeatability of the 
assay |43); here, we regard 12% as the upper bound). 

3. We can make a summary about the performance of the ex¬ 
perimental values (mean fi{r) and variance cr 2 (r)) comparing 
to the theoretical ones (expectation E(r) and variance B(r)): 

a) . During SI, p(r) is close to E(r); cr 2 (r) is much less than D(r); 

b) . During S3, p(r) has a little bias (i.e. | p(r) — D(r)|), but more 
than SI; cr 2 (r) is acceptable (yf a 2 (r) / p(r) < 12%); 

c) . For S2, S4, and S5, /i(r) has much more bias; cr 2 (r) is always 
too big to be accepted; 

d) . variance ct 2 (t) is much more sensitive than mean /x(r). 

Then, we can conjecture that: 

a) . The more stable the system is, the more effective the theoretical 
values become Gu(t) is close to E(r); ct 2 (t) is less than D(r)); 

b) . Different test functions p(\) have different characteristics and 
functions. In this sense, we can balance the reliability and sensitivity 
for anomaly detection in a special system; 

c) . In addition, a test function is akin to a filter in some sense; beyond 
event detection (distinguish signals from noises), it has the potential 
to trace a specific anomaly (distinguish the signal from others); 

d) . For a special purpose, e.g. the lowest c v or the lowest bias, there 
should exist an optimal combination of the Chebyshev Polynomials 
as the test function. 

B. Advantages of LES and Visualization Using 3D Power-Map 

A LES is an indicator in high dimensions; its value depends 
on a wide sample data in the form of the entries of the RMM 
X (Sec IV). As a result, LES r, as a statistical indicator, is 
sensitive and robust against bad data; these advantages will be 
validated via the visualization of r using 3D power-map later. 

For this case, we keep the hypothetical scenario as Table 
[m| and take w as the indicator. We obtain the regional r LRF 
for each regior^j in a similar way to that for the whole system 
(118 nodes); thus, the t lrf -£ curves are plotted as Figure [2] 
We denote ?7 = t lrf /E (r LRF ) as the high-dimensional indi¬ 
cator. For each time point, with an interpolation method m, 
a 3D map is able to be plotted. Figure [3j [5] depict some key 
frames of the 3D power-map animation with r /, whereas Figure 
m with the raw data V: 

a) For Figure [3j at time £ = 601 s, 77 of the area around 
A3 changes relative dramatically; this trend last for the next 

3 this division depends on the specific network structure; however, a poten¬ 
tial problem lies here: how small the partition (see Figure [TT)) can be that 
keeps the theories still work. To find the answer we should study how to turn 
big data into tiny data it is another topic and we do not expand here. 
Theoretically, the bias of r is closely related to the size of RMMs X (more 
specifically, N ) 


T — 2 = 238 sampling points (i.e. £ s = 602:839 s) in the an¬ 
imation. Therefore, we conjecture that some event occurs in 
A3; even we can go further that the event is influential to Al, 
A2, A4, A5, and has little impact on A 6 . These conjectures, in 
a reasoning way, coincide with the common sense that there 
is a sudden change in A3 at £ = 601 s. 

b) For Figure [3j with sustainable growth of power demand at 
some bus (Pbus -52 for this case), the whole system becomes 
more and more vulnerable. The vulnerability can be estimated, 
before the system has a breakdown due to voltage collapse, 
via the visualization of 77 . 

c) Moreover, if the most related data (data of A3 for this 
case) are lost somehow, hardly any information can be gotten 
by V as Figure [ 6 ] whereas the proper judgements can still be 
achieved by 77 as Figure [5] 

In general, the combination of high-dimensional indicators 
(e.g. 77 ) and 3D power-map is really a novel and feasible 
approach to EED in smart grids. 


C. Conduct EED Using Real Data 

This case is based on the operation data of a certain 
interconnected power grid in China; the data lists are shown 


as Figure 11 The data source includes data of substations, 
breakers, lines, buses, generators, frequencies, etc. These data 
are from 6 administrative regions—5 provincial ones and 1 
directly affiliated one. The sampling frequency is once per 
minute and the sampling lasts for 3 days (4320 minutes). 

1) Model—Size 42 x 90, voltage data, r MSR : 

We only focus on a regional bus voltage data with 42 nodes; 
thus, the data source Q is of 42x4320 = 181440 sampling 
voltage data. Here, we choose r MSR as the indicator. The RMM 
X at each sampling point is modeled in size of 42 x 90; the 
time length of the split-window is 90 (T = 90 m, and T/2 = 45 
m). Figure [ 7 ] shows the result. 


Tlength=90 N=42 Block=3||Ver=165 



Sample Time (Min) 

Fig. 7: t msr -£ Curve for Regional Bus Voltage Data 

In Figure [ 7 ] we can extract some half ”U”-shaped curves 
according to the legends (the red for start points and the purple 
for end ones). As our analyses above, the start points corre¬ 
spond to the right time when the event occurs. They are almost 
£ = 06 : 55 , maybe for the working start, and £ = 10 : 50 , 
maybe for the lunch break, respectively. 
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Tlength=240 N=11 ||Ver=65 Tlength=240 N=33 ||Ver=66 Tlength=240 N=19 ||Ver=67 



Sample Time(s) Sample Time(s) Sample Time(s) 


Fig. 2: r LRF -t Curve for Single Partitioning: A1-A6 



(a) t s = 600 s 



(a) t s = 600 s 


: - fA5i 

[A6l ~ 


[ah - 

i 

ES a, 


|T 7 


(a) t s = 600 s 


ii^lf a fi -I'Ti; 


(b) = 601 s (c) i s = 720 s (d)t 8 = 1198 s (e)* s = 1250 s 

Fig. 3: Visualization of the high-dimensional indictor r] with Full Data Sets 








(b) =601 s (c) t s = 720 s (d) t s = 1198 s (e)t s = 1250 s 

Fig. 4: Visualization of the Voltage V with Full Data Sets 



i5 


M 


n 


(b) =601 s (c) t s = 720 s (d) t s = 1198 s (e)t s = 1250 s 

Fig. 5: Visualization of the high-dimensional indictor r] without Data Set of A3 


(f) t s = 1350 s 



I 

m 


(f) ts = 1350 s 




(f) ts = 1350 s 



(b)* s =601s (c) t s = 720 s (d) t s = 1198 s (e)t s = 1250 s 

Fig. 6: Visualization of the Voltage V without Data Set of A3 



(f) t s = 1350 s 


(a) t s = 600 s 




































































































































9 


2) Model — T — 90 m, power flow data, r LRF : 

We focus on the power flow data (687 lines); we choose r LRF 
and keep T = 90 m. Figure [8] shows the results; some special 
time points, such as t = 4 : 44 and t = 10 : 42 , are able to be 
observed. For each administrative region, we conduct a similar 
procedure and Figure [9] shows the results. 


t 1= x lrf /6 ( x lrf> 


Global Observation (AO): Tlength=90, N=687 



Fig. 8: rj-t Curve for Global Power Flow Data 



Fig. 9: rj-t Curves for Distributed Power Flow Data 

In Figure [8] we regard the bottom part (the purple) but not 
the top as the start time. That is because the ”U”-shaped curves 
for the r MSR (Figure [I]) and the r LRF (Figure [2]) are reversed. 

3) Summary: 

From both models above, some key time points, as well 
as the daily periodicity, are able to be observed. The bus 
voltage and the power flow have different engineering mean¬ 
ings essentially: the former is closely related to local loads, 
whereas the later is to power exchanges between connected 
regions. This may be the reason why their key time points 
are different. Moreover, we believe that the information loss 
during the processing proceeding in high dimensions should 
be much less than in low dimensions, even negligible with 
a proper designed LES (the LES r is almost arbitrary; it 
is only related to the test function p{\)). In our opinions, 
the integration of statistic analyses in high dimensions and 
engineering explanations (see Procedure of Ring Law Analysis 
in Sec IV) are the two procedures to mine the potential values 
from the complex 4Vs data. More work (including more real 
data) are essential for further mining. 

VI. Conclusion 

This paper proposes a data-driven unsupervised learning 


method for early event detection (EED). Based on random 
matrix theories (RMTs) as our mathematical foundations, the 
linear eigenvalue statistics (LESs) are proposed as the key 
indicators. In addition, we compare the proposed method with 
a supervised one (a dimensionality reduction method based 
on PC A). Case studies, with both simulated data and real 
ones, validate the effectiveness and the advantages of the 
proposed method. Especially, its robustness against bad data 
is highlighted—the combination of 3D power-map and high¬ 
dimensional indicators, even with data source loss in the core 
area, is still able to conduct EED effectively. 

This work is another attempt towards applying big data to 
power systems. It mainly argues that random matrix models 
(RMMs) are a proper tool to reveal physical systems in high 
dimensional perspectives. This enables us to mine the potential 
values from the complex 4Vs data resources efficiently. Three 
main steps are essential as the procedure: 1) to build the RMM 
with raw data; 2) to conduct high-dimensional analyses with 
statistical transformations; 3) to interpret the results to human 
beings. The proposed data-driven unsupervised approach is 
universal, as it mainly rely on raw data which are independent 
of empirical models or labelled parameters; it is also sensitive 
since it uses the high dimensional statistics as the indicators. 
Moreover, for some data processing ingredients, the theoretical 
values are predictable by the latest theorems. 

However, our findings only make up a tiny fraction of 
high-dimensional analytics. More problems need to be further 
studied: 1) how to design the test function as a filter to 
detect potential anomalies in real systems {Sec III, V); 2) how 
to choose data source or make data fusion to meet certain 
engineering requirements {Sec ///); 3) how to explain the high¬ 
dimensional findings to human beings {Sec IV, V); 4) how 
to turn big data into tiny data {Sec V). One wonders if this 
direction will be far-reaching in years to come toward the Big 
Data Age for smart grids. 

Appendix A 
Formulas for RMT 

A. Convert X G C NyT to X G C NxT : 

= ( 18 ) 
where x 4 ={x iA ,x i 2 , ■ ■ ■ ,x i T ) and /i(x i ) = 0, a 2 (x i ) = l. 

B. Convert X G C NyT to X M G C NxN : 


x u = yxiPu (i9) 

where U € C NxN is a Haar unitary matrix; X„X„ H = XX H . 

C. Convert Z G C NxN to Z G C^: 

z t = z i /(v / Vcr(zJ) (20) 

where z t = (z^z^, ■■■ , z iN ), Z = ]lf=i x «,i- 
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Appendix B 

Definition of the Gaussian orthogonal ensemble 
(GOE) 

This is a real symmetric nxn random matrix: 

M = n- 1 ' 2 W, W = {W jtk e K, W jtk = W kJ }l k=1 (21) 

defined by the probability law: 

JJ dW hk (22) 

l^j^k^n 

where Z n \ is the normalization constant. Since: 

TrW 2 = ]T Wjf I 2 £ W jtk 2 

1 ^j<k^n 

the above implies that {^,/c}i<j</c<n are independent Gaus¬ 
sian random variables such that: 

HWj,k) = o E(tr iife 2 ) = w 2 ( 1 + 5 jtk ) (23) 
Appendix C 

Expectation of the Random Variable with 
Gaussian Distribution 

X is a random variable with standard normal distribution; 
it is described by the probability density function (PDF): 

/w = WX" 

1) E(X),E(X 2 ),E(X 4 ): 


2) / 0 °° e * 2 dt, / 0 °°i 2 e * 2 dt: 


Suppose: 




Appendix D 




Fig. 10: Partitioning network for the IEEE 118-bus system. 
There are six partitions, i.e. Al, A2, A3, A4, A5, and A6. 


Appendix E 


TABLE III: Series of Events 


Par\A Time 

[001 : 600J 

[601 : 1200J [1201 : 1500J 

Pbus-52 (MW) 

0 

300 t - 900 


(24) 


*Tbus -52 is the power demand of bus-52 
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Appendix F 



Fig. 11: The structure of the data lists for a certain power grid 
in China 
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